-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core][Bugfix] Use correct device to initialize GPU data during CUDA-graph-capture #11233
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense to me. but right now we only use one GPU per process. What's the use case when you have another GPU here?
17e8bd3
to
837cd72
Compare
We are using vllm as a library and were hitting this. Sounds like a good practice to not assume running on single GPU, especially if it doesn't cost anything |
837cd72
to
00b14f5
Compare
thanks for the contribution! we can merge it as long as ci passes. |
Head branch was pushed to by a user without write access
93fac08
to
87d4caf
Compare
…graph-capture Signed-off-by: Yan Burman <[email protected]> Signed-off-by: Ido Asraff <[email protected]>
My PR was failing on unrelated stuff. seems fixed now in main. could you please merge? |
Until now, tensors and tiles were created over the default device. When trying to capture a CUDA-graph over other different devices, the code failed since the data was created on a different GPU.
This PR fixes this problem by initializing the data on the correct device.